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DETAILED ACTION 



Claim Rejections - 35 (JSC § 102 



1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

2. Claims 1,2,6,8.9,10.14 and 15 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Alshawi (U.S. Patent 581.666). 

Regarding claims land 9, Alshawi discloses an integrated method and apparatus 
for providing real-time subtitles [captioning] in an AV signal. The disclosure 
includes the automatic conversion of an audio [including speech] signal in the AV 
signal to text [caption] data and associating the audio and text [caption] data at a 
time that corresponds to the video signal. Alshawi describes in Fig 1 a video- 
based communications device (5,8). The device provides segmentation of an AV 
signal (16) and the further processing of the audio [speech] portion of the signal 
to provide continuous speech-to-subtitles [speech-to-text] translation (19,21,22) 
that has the ability to overlay and display text subtitles onto AV signal in real-time 
[captioning](26). 
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Regarding claims 2 and 10. Alshawi discloses a method and apparatus that 
captures an AV signal and further provides the audio [speech] portion of the 
signal for conversion to text. Alshawi describes a videophone receiver that has 
an input signal that comprises a camera that represents the visual component of 
the communication and a microphone that represents the audio component of 
the signal that have been encoded. (Col 2, 33 - 40). In addition, Alshawi 
describes an audio/video decoder that accepts an AV input and separates the 
signal into two entities, video signal and audio signal (Col 2, 51 - 55). 

Regarding claims 6 and 14, Alshawi discloses a display that shows at least the 
video and text [caption] data. Alshawi describes simultaneously displaying the 
sending party's video overlaid with real-time subtitles [caption] that translates the 
sender's speech (Col 3, 26 - 29). 

Regarding claims 8 and 15, Alshawi discloses a method and apparatus for 
translating speech and caption into a second language. Alshawi describes an 
embodiment where the textual signal is translated into a target language that is 
then overlaid onto the video signal as real-time subtitles [caption] (Col 3, 46). 
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Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. The factual inquiries set forth in Graham v. John Deere Co,, 383 U.S. 1. 148 
USPQ 459 (1966), that are applied for establishing a background for determining 
obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

5. Claims 3 & 1 1 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Alshawi (U.S. Patent 5815196) in view of Bozdagi et al. (U.S. Patent 6647535). 

Regarding claims 3 and 11, Alshawi discloses an integrated method and 
apparatus for providing real-time subtitles [captioning] in an AV signal. The 
disclosure includes the automatic conversion of an audio [including speech] 
signal in the AV signal to text [caption] data and associating the audio and text 
[caption] data at a time that corresponds to the video signal. Alshawi describes in 
Fig 1., a video-based communications device (5,8). The device provides 
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segmentation of an AV signal (16) and the further processing of the audio 
[speech] portion of the signal to provide continuous speech-to-subtitles [speech- 
to-text] translation (19.21 ,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning](26). 

Alshawi does not show a method of converting the audio portion of the signal to 
text data that checks whether the amount of caption data is greater than a 
threshold amount or an expiration time before the process of association occurs. 
Bozdagi et al. show a system and method to enable real-time and near real-time 
storyboarding on the world wide web. Bozagi et al. teach the use of processing a 
multimedia document which summarizes the original video by placing 
representative static images and text into a web document for viewing (Col 2, 5). 
In addition, the device can control the number of representative images 
transferred to be displayed by the use of a threshold (Col5. Line45-55). Also, 
time is used to check the change in intensity between representative images (Col 
6, 7). This gives the advantage of greater flexibility in viewing multimedia and 
reduces on the overall demand for bandwidth. 

Therefore, it would have been obvious to one of ordinary skill at the time of the 
invention to modify Alshawi by the use of parameters such as caption amount 
and time threshold as taught by Bozagi et al. that show the benefits of the 
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association of text and images for multimedia documents which may include AV 



6. Claims 4, 5, 7,12.13,17,18,20,21,22,23 and 24 are rejected under 35 U.S.C. 
103(a) as being unpatentable over Alshawi (U.S. Patent 5815196) in view of 
Kazeroonian et al. (International Application Number: PCT/US99/03028). 

Regarding claim 4 and 12, Alshawi discloses an integrated method and 
apparatus for providing real-time subtitles [captioning] in an AV signal. The 
disclosure includes the automatic conversion of an audio [including speech] 
signal in the AV signal to text [caption] data and associating the audio and text 
[caption] data at a time that corresponds to the video signal. Alshawi describes in 
Fig 1., a video-based communications device (5,8), The device provides 
segmentation of an AV signal (16) and the further processing of the audio 
[speech] portion of the signal to provide continuous speech-to-subtitles [speech- 
to-text] translation (19,21,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning](26). 



signals. 



Alshawi does not show the synchronizing of the text [caption] data with one or 
more cues in the AV signal. However, Kazeroonian et al. teach a real-time 
process for synchronizing of textual data with an AV signal. Kazeroonian et al. 
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describe acquiring and indexing [addition of cues] discrete scenes within the AV 
signal. For each scene, the textual information related to a particular scene can 
be determined using a speech recognizer [speech-to-text processor] on the audio 
portion of the signal [and which is executed on a computer with a stored 
recordable medium (Page 13, Line 33). In highly dynamic real time video this 
indexing feature is important in synchronizing the AV signal to the shown textual 
[caption] data. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to further modify Alshawi with the addition of cues/indexing to 
the AV signal to help synchronize text [caption] and AV signal data as taught by 
Kazeroonian in order to improve on the real time captioning system for AV 
signals. 

Regarding claim 5 and 13, Alshawi discloses an integrated method and 
apparatus for providing real-time subtitles [captioning] in an AV signal. The 
disclosure includes the automatic conversion of an audio [including speech] 
signal in the AV signal to text [caption] data and associating the audio and text 
[caption] data at a time that corresponds to the video signal. Alshawi describes in 
Fig 1., a video-based communications device (5,8). The device provides 
segmentation of an AV signal (16) and the further processing of the audio 
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[speech] portion of the signal to provide continuous speech-to-subtitles [speech- 
to-text] translation (19,21 ,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning](26). 

Alshawi does not show the embedding [encoding] of the text [caption] data within 
the AV signal. Instead, a subtitle generator (24, Fig. 1) is used to overlay text data 
onto the AV signal. However, Kazeroonian et al. teach a real-time process for 
embedding of text into an audio-video signal. Kazeroonian et al. describe 
acquiring and indexing discrete scenes within the AV signal. For each scene, the 
textual information related to a particular scene can be determined using a 
speech recognizer [speech-to-text processor] on the audio portion of the signal 
(Page 13, Line 33). The AV data for each scene is stored in a database 
[executed on a computer with a stored recordable medium] where textual 
information can be associated with indexed AV frames. This data can be later 
accessed/presented in an embedded format where users can view the AV signal 
with the associated text [caption] data. (Page 13, Line 12) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to further modify Alshawi by embedding [associating] text data 
with AV signal data using a database as taught by Kazeroonian et al. in order to 
improve on the real time captioning system for AV signals. 
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Regarding claim 7, 17 and 23, Alshawi discloses an integrated method and 
apparatus for providing real-time subtitles [captioning] in an AV signal. The 
disclosure includes the automatic conversion of an audio [including speech] 
signal in the AV signal to text [caption] data and associating the audio and text 
[caption] data at a time that corresponds to the video signal. Alshawi describes in 
Fig 1., a video-based communications device (5,8). The device provides 
segmentation of an AV signal (16) and the further processing of the audio 
[speech] portion of the signal to provide continuous speech-to-subtitles [speech- 
to-text] translation (19,21 ,22) that has the ability to overlay and display text 
subtitles onto AV signal in real-time [captioning](26). 

Alshawi does not show the ability to store the video and associated text [caption] 
data on a recordable medium. However, Kazeroonian et al. teach a real-time 
process for storing of audio-video data with the associated text. Kazeroonian et 
al. describe acquiring and indexing discrete scenes within the AV signal. For 
each scene, the textual information related to a particular scene can be 
determined using a speech recognizer [speech-to-text processor] on the audio 
portion of the signal (Page 13, Line 33). The AV data for each scene is stored in 
a database stored on a media server where textual information can be 
associated with indexed AV frames. 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to further modify Alshawi by the use of a database stored on a 
recordable medium as taught by Kazeroonian et al. in order to improve on the 
capability of a captioning system for AV signals. 

Regarding claims 18. the modified Alshawi (see rejection for claim 7) discloses a 
method and apparatus that captures an AV signal and further provides the audio 
[speech] portion of the signal for conversion to text. Alshawi also describes a 
videophone receiver that has an input signal that comprises a camera that 
represents the visual component of the communication and a microphone that 
represents the audio component of the signal that have been encoded. (Col 2, 33 
- 40). In addition, Alshawi describes an audio/video decoder that accepts an AV 
input and separates the signal into two entities, video signal and audio signal 
(Col 2. 51 - 55). 

Regarding claims 20, the modified Alshawi (see rejection to claim 17) discloses 
the use of a closed captioning system using a computer recordable stored 
medium. The modified Alshawi (see rejection to claim 4) also shows the 
obviousness of using cues in the AV signal as a means of synchronizing text 
data with the AV signal. 
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Regarding claims 21, the modified Alshawi (see rejection to claim 17) discloses 
the use of a closed captioning system using a computer recordable stored 
medium. The modified Alshawi (See rejection to claim 5) also shows the 
obviousness of embedding text data with the AV signal. 

Regarding claim 22. the modified Alshawi discloses (see rejection for claim 17) a 
display that shows at least the video and caption data. Alshawi also describes 
simultaneously displaying the sending party's video overlaid with real-time 
subtitles [caption] that translates the sender's speech (Col 3, 26 - 29). 

Regarding claims 24, the modified Alshawi discloses (see rejection for claim 17) 
a method and apparatus for translating speech and caption into a second 
language. Alshawi also describes an embodiment where the textual signal is 
translated into a target language that is then overlaid onto the video signal as 
real-time subtitles [caption] (Col 3, 46). 

7. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Alshawi 
(U.S. Patent 5815196) in view of Kirkland et al. (U.S. Patent 5900908). 

Regarding claim 16, Alshawi discloses an integrated method and apparatus for 
providing real-time subtitles [captioning] in an AV signal. The disclosure includes 
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the automatic conversion of an audio [including speech] signal in the AV signal to 
text [caption] data and associating the audio and text [caption] data at a time that 
corresponds to the video signal. Alshawi describes in Fig 1., a video-based 
communications device (5,8). The device provides segmentation of an AV signal 
(16) and the further processing of the audio [speech] portion of the signal to 
provide continuous speech-to-subtitles [speech-to-text] translation (19,21.22) that 
has the ability to overlay and display text subtitles onto AV signal in real-time 
[captioning](26). Alshawi does not show portability and the utilization of the 
device in the classroom. However, Kirkland teaches a method of providing 
encoding caption data into the program signal. The apparatus receives a 
television signal with various description data including caption data (Col 9, 15). 
The device itself is a set-top box that can be co-located with a television 
[portable] and which can be used for live performances, classrooms and other 
types of presentations (Col 3,29 and CollO, 60), Devices with such features help 
the handicap or physically impaired by providing text or audio services. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to further modify Alshawi by making it portable for use in such 
venues as a classroom taught by Kirkland in order to improve on the capability of 
the captioning system for use in the classroom. 



Application/Control Number: 09/800.212 Page 13 

Art Unit: 2655 

8. Claim 19 is rejected under 35 U.S.C. 103(a) as being unpatentable over Alshawi 
(U.S. Patent 5815196) in view of Kazeroonian et al. (International Application Number: 
PCT/US99/03028) and further in view of Bozdagi et al. (U.S. Patent 6647535). 

Regarding claims 19, the modified Alshawi (see rejection to claim 17) discloses 
the use of a closed captioning system using a computer recordable stored 
medium. The modified Alshawi (see rejection to claim 3) also shows the 
obviousness of using caption data amount and time as a means of associating 
text data to AV signal. 



Conclusion 



12. Any response to this action should be mailed to: 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 
or faxed to: 
(703)872 9314, 

(for informal or draft communications, please label "PROPOSED" or 
"DRAFT") 

Hand-delivered responses should be brought to Crystal Park II, 2121 Crystal 
Drive, Arlington. VA., Sixth Floor (Receptionist). 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael Lewis, telephone number (703)305-8730. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supen/isor, Ms. Doris To, can be reached at (703)305-4827. The facsimile phone 
number for this group is (703)872-9314. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the Group 2600 receptionist whose telephone number 
is (703) 305-4750, the 2600 Customer Service telephone number is (703) 306-0377. 



mal 

10/9/2003 




DORIS H. TO ^ ^ 
SUPERVISORY PATENT EXAMINER 
TECHNOLOGY CENTER 2800: 



